GOAL:
- Analyze time series of achievement histories of a sample of
profiles.
- Can we forecast user engagement?
Create Files Directory
- Gets file directories of all CSVs to pull for analysis.
- Applies transformations for consistent format to dates of last
scraped
directory_df = create_file_directory()
directory_df = directory_transformations(directory_df)
Read Manifests
- Load the full leaderboard of TA for exploratory data analysis.
- Read the full achievements manifest for later analysis.
lb_df = read.csv("./data/leaderboard/leaderboard.csv")
lb_df = lb_feature_transformations(lb_df)
achievements_manifest = read.csv("./data/manifest/achievements_manifest.csv")
Read Sample of Gamers
- We take a sample of 200 profiles using our file directory and order
them.
set.seed(196)
rnd_gamer_sample = sample_random_gamers(200, directory_df = directory_df)
rnd_gamer_sample = lapply(rnd_gamer_sample, function(x) x[order(rnd_gamer_sample[[3]])])
Metrics Preprocessing (Total)
- Outputs total observations
- Processes the metrics data frame for all profiles
print(paste("TOTAL OBSERVATIONS:", get_total_observations(rnd_gamer_sample[[1]])))
## [1] "TOTAL OBSERVATIONS: 401773"
metrics_df = process_metrics_df(rnd_gamer_sample, directory_df)
Frequency Data Preprocessing
- Intermediate Step to analyze each profile in the sample later.
frequency_dfs = achievement_calculate_frequencies(rnd_gamer_sample)
frequency_combined_df = bind_rows(frequency_dfs, .id = "data_frame_id")
frequency_combined_df$data_frame_id = as.numeric(frequency_combined_df$data_frame_id)
da_df = calculate_daily_achievements(frequency_combined_df)
da_df = da_fill_dates(da_df)
da_profiles = da_split_by_profile(da_df)
da_profiles = da_profiles_set_churn(da_profiles)
## [1] "PROFILE: 150 DROPPED (All NA values)"
da_profiles = da_profiles_set_days_existence(da_profiles)
da_profiles = calculate_daily_lt_eir(da_profiles)
da_profiles = calculate_weekly_eir_all(da_profiles)
da_profiles = calculate_monthly_eir_all(da_profiles)
Frequency Plots by Profile
- Can select the profile and temporal metric
- Note: This Shiny app won’t display in the
self-contained HTML file. To interact with the app, you can run the RMD
document in an R Markdown viewer or in the RStudio IDE.
## PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.
Shiny applications not supported in static R Markdown documents
Churned @ 365 Days Histogram
- Most users from this sample, approx. 75% not churned by this
definition
# Plot histogram of churned with different colors for TRUE, FALSE, and NA
ggplot(metrics_df, aes(x = churned, fill = factor(churned))) +
geom_bar(color = "white") +
scale_fill_manual(values = c("darkgreen", "darkred", "gray")) +
labs(title = "Churned Histogram (365 Days Since Last Achievement)", x = "Churned Status", y = "Count")

Longest Streak Histogram
- Most users have 4 or 5 days as their longest streak.
- This sample approximates a roughly normal distribution.
ggplot(metrics_df, aes(x = longest_streak, fill = factor(longest_streak))) +
geom_bar(color = "white") +
labs(title = "Streak Histogram", x = "Longest Streak (in Days)", y = "Count")

Game Time Box Plot
- Most players hover in the thousands of hours with several outliers
above 10,000
- This plot only shows Xbox One and Series X|S titles.
# Create the box plot for game time
ggplot(metrics_df, aes(x = "", y = total_game_time_minutes / 60, fill = "Game Time")) +
geom_boxplot(width = 0.5, position = position_dodge(width = 0.9), color = "black", outlier.color = "darkred", outlier.shape = 16, outlier.size = 3) +
labs(x = "", y = "Game Time (Hours)", fill = "") +
scale_fill_manual(values = "#FF7F00") +
theme(legend.position = "top", legend.title = element_blank()) +
scale_y_continuous(labels = scales::comma) +
coord_flip()

App Time Box Plot
- We filter out 138 values of zero for users who don’t use apps on
Xbox.
- Of the 62 players who use apps on Xbox, most hover at or below
2,000. This suggests that the users who do have significant app time on
their profile use Xbox for the apps tracked.
- This plot only shows Xbox One and Series X|S titles.
# Create the box plot
ggplot(metrics_df[metrics_df$total_app_time_minutes > 0,], aes(x = "", y = total_app_time_minutes / 60, fill = "App Time")) +
geom_boxplot(width = 0.5, position = position_dodge(width = 0.9), color = "black", outlier.color = "darkblue", outlier.shape = 16, outlier.size = 3) +
labs(x = "", y = "App Time (Hours)", fill = "", caption = paste("Number of Zero Values Filtered Out:", sum(metrics_df$total_app_time_minutes == 0))) +
scale_fill_manual(values = "#1F78B4") +
theme(legend.position = "top", legend.title = element_blank()) +
scale_y_continuous(labels = scales::comma) +
coord_flip()

Game vs App Time Scatter Plot
- Most players don’t have any logged time into apps regardless of game
time. This suggests from this sample most players engage in app content
outside of Xbox.
ggplot(metrics_df, aes(x = total_game_time_minutes / 60, y = total_app_time_minutes / 60, color = total_app_time_minutes / 60)) +
geom_point() +
labs(x = "Total Game Time (Hours)", y = "Total App Time (Hours)", color = "Total App Time (Hours)") +
scale_color_gradient(low = "blue", high = "red") +
ggtitle("Total Time: Game vs App (Hours)") +
scale_x_continuous(labels = scales::comma) +
scale_y_continuous(labels = scales::comma)
